feat: Add Zarr v2 metadata access and basic auth support for OIDC-protected assets#2112
feat: Add Zarr v2 metadata access and basic auth support for OIDC-protected assets#2112
Conversation
Code Coverage (Ubuntu)DetailsDiff against developResults for commit: 9faf04a Minimum allowed coverage is ♻️ This comment has been updated with latest results |
Code Coverage (Windows)DetailsDiff against developResults for commit: 9faf04a Minimum allowed coverage is ♻️ This comment has been updated with latest results |
|
@cauriol, instead of introducing a new method @sbrunato can we get your opinion here ? |
@alambare I agree. |
141c247 to
ca62273
Compare
| def request_asset( | ||
| self, | ||
| url: str, | ||
| ) -> requests.Response: | ||
| """Perform a GET request to the given URL using product's authentication headers.""" | ||
| headers = self.get_storage_options().get("headers", {}) | ||
| return requests.get(url, headers=headers, stream=True) |
There was a problem hiding this comment.
we should use the stream download method from EODAG download plugins instead of creating a new method.
| ) | ||
|
|
||
| if ".zmetadata" in mapper: | ||
| meta = json.loads(mapper[".zmetadata"]) |
There was a problem hiding this comment.
Use orjson instead of json.
| # TODO: Support Zarr v3 when test data becomes available. | ||
| # Zarr v2 uses `.zmetadata`, while Zarr v3 exposes `zarr.json`. | ||
| # The implementation should be straightforward once we can validate it | ||
| # against real examples. | ||
| raise ValueError(f"No Zarr metadata file found at {base_url}") |
There was a problem hiding this comment.
With Destination Earth, we do have a zarr v3 store: DanubeHis data. I don't understand what is blocking here.
| headers = self.get_storage_options().get("headers", {}) | ||
| return requests.get(url, headers=headers, stream=True) | ||
|
|
||
| def list_zarr_files_from_metadata( |
There was a problem hiding this comment.
In zarr, we have keys and not files. I suppose you can rename the method to something like list_zarr_keys ?
| try: | ||
| url = self.assets[asset_key]["href"] if asset_key else self.location | ||
| except KeyError as e: | ||
| raise DatasetCreationError(f"{asset_key} not found in {self} assets") from e |
There was a problem hiding this comment.
You are not creating a xarray here. The exception class is not accurate.
There was a problem hiding this comment.
I put AddressNotFound now
| "zipstream-ng", | ||
| "fsspec", | ||
| "aiohttp", | ||
| "requests" |
There was a problem hiding this comment.
requests is already a dependency in EODAG.
| mapper = fsspec.get_mapper( | ||
| base_url, | ||
| client_kwargs={"headers": headers, "trust_env": False}, | ||
| ) |
There was a problem hiding this comment.
I am not sure about introducing fsspec in EODAG library.
The library is a wrapper above multiple backends. To use it, you need to bring backends to support all the filesystems: s3fs for S3, gcsfs for google cloud storage and adlfs for azure blob storage.
Most of the times, users will store zarr files on a S3 compatible store but some may store them on google or azure stores.
Maybe we don't want to support google / azure stores ? or in an optional dependencies branch ?
My opinion:
- we do not support google and azure yet. Maybe once if we replace
boto3byobstore. - we do not bring
fsspecin EODAG but instead callboto3to get the list of zarr keys. If a dedicated function is needed to get the list of keys with boto3, it should live ineodag/utils/s3.py.
@sbrunato what do you think?
There was a problem hiding this comment.
i think we should not import fsspec directly in eodag and also, not sure if we should handle zarr subsetting from eodag.
Having storage options is great and should be useful to open zarr from xarray or eodag-cube
This PR adds support for accessing Zarr v2 assets from EOProduct and extends OIDC authentication handling to support Basic authorization for protected asset access.
Changes:
-add helper methods in EOProduct to:
- extract auth headers from auth objects
- request protected assets with forwarded authentication headers
- list files from Zarr v2 metadata using .zmetadata
Notes
this PR currently targets Zarr v2 only through .zmetadata
Zarr v3 support can be added later if needed